[Opt](ai-func) Improving AI function performance#62494
[Opt](ai-func) Improving AI function performance#62494linrrzqqq wants to merge 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Found 1 blocking issue.
be/src/exprs/function/ai/embed.h: text embedding batching breaks Gemini resources._execute_text_embed()now batches multiple prompts into onebuild_embedding_request(inputs, ...)call, butGeminiAdapter::build_embedding_request()still serializes them into a singlecontentobject andparse_embedding_response()still returns a single embedding for the text path. Runningembed()on multiple rows with a GEMINI AI resource will now fail the cardinality check (expected N got 1) or effectively only embed the last input. Please either keep Gemini on the old per-row path or implement Gemini's true batch text embedding protocol before enabling batching here.
Critical checkpoint conclusions:
- Goal and correctness: The PR aims to improve AI-function performance via batching. That is only partially achieved because multi-row text
embed()is no longer correct for all supported providers. Existing tests do not cover the failing Gemini text-embedding path. - Scope/minimality: The change is focused, but the generic text-embedding batching applies to providers with different protocol semantics.
- Concurrency: No new thread-safety or locking issue identified; the path remains synchronous.
- Lifecycle/static init: No special lifecycle or static initialization issue found.
- Configuration:
multimodal_embed_max_batch_file_countis added and forwarded to BE correctly throughTQueryOptions. - Compatibility: No storage-format or persistence compatibility issue found; FE/BE query-option propagation looks complete for the new variable.
- Parallel paths: Multimodal embedding and string AI functions were updated, but the provider-specific Gemini text embedding path was not handled consistently.
- Special conditions/checks: The new multimodal input validation is reasonable.
- Test coverage: Unit coverage improved, but there is no test for multi-row
embed()with a GEMINI resource, which is the broken path here. - Test result files: Not applicable.
- Observability: Existing observability is sufficient for this review; no blocker here.
- Transaction/persistence/data writes/FE-BE variable passing: Not applicable beyond query-option forwarding, which is covered.
- Performance: Batching should help supported providers, but this regression must be fixed first.
- Other issues: No additional blocking issue confirmed beyond the Gemini regression.
831a82e to
75d0015
Compare
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found one blocking issue.
-
Critical checkpoint: Goal / correctness / tests
Conclusion: The PR goal is to batch AI function requests for better performance, and the BE unit tests cover several batch-splitting and parsing cases. However, one provider-specific compatibility path is broken, so the implementation does not fully accomplish the goal safely. -
Critical checkpoint: Scope and focus
Conclusion: The change is reasonably focused on AI batching plus the needed session/thrift plumbing. -
Critical checkpoint: Concurrency
Conclusion: No new concurrency or locking risks were introduced in the modified paths. The execution remains row-batch local inside function evaluation. -
Critical checkpoint: Lifecycle / static initialization
Conclusion: No applicable lifecycle or static initialization issues found in the changed code. -
Critical checkpoint: Configuration changes
Conclusion:embed_max_batch_sizeis added as a forwarded session variable and wired intoTQueryOptions. FE forwarding looks complete for the new option. -
Critical checkpoint: Compatibility / protocol changes
Conclusion: FE/BE protocol compatibility forembed_max_batch_sizeis handled, but provider API compatibility is not fully preserved: Gemini embedding now emits a batchedrequestspayload without any corresponding endpoint normalization or compatibility handling. -
Critical checkpoint: Parallel code paths
Conclusion: Text AI functions and EMBED were both updated for batching. No missing sibling path stood out beyond the Gemini provider-specific compatibility issue below. -
Critical checkpoint: Special condition checks
Conclusion: Input-size validation added for multimodal requests is appropriate and clear. -
Critical checkpoint: Test coverage
Conclusion: BE unit tests were expanded well for batch parsing/splitting and multimodal request shapes. But there is no test proving Gemini works end-to-end against the configured endpoint shape after this request-format change. -
Critical checkpoint: Test result files
Conclusion: Not applicable; no regression.outfiles changed. -
Critical checkpoint: Observability
Conclusion: No new observability additions were needed for this scope. -
Critical checkpoint: Transaction / persistence
Conclusion: Not applicable. -
Critical checkpoint: Data writes / atomicity
Conclusion: Not applicable. -
Critical checkpoint: FE-BE variable passing
Conclusion: The newembed_max_batch_sizevariable is forwarded to BE correctly. -
Critical checkpoint: Performance
Conclusion: The batching approach should significantly improve throughput, but the Gemini compatibility regression blocks safe acceptance. -
Critical checkpoint: Other issues
Conclusion: No other blocking issue confirmed beyond the inline finding.
Because of the Gemini compatibility regression, I’m requesting changes.
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found two blocking regressions in the new batching work.
append_parsed_text_result()now treats any scalar text wrapped in[]as a batch array and errors out if it is not valid JSON. That breaks legitimate scalar outputs for existing functions, for exampleai_maskexplicitly asks the model to emit[MSKED], andai_aggstill routes arbitrary scalar text throughadapter->parse_response().- Gemini embed batching now always builds a
batchEmbedContents-style request body (requestsarray), butnormalize_endpoint()only rewrites bare.../v1/.../v1betaroots. Existing AI resources configured with the documented explicit...:embedContentendpoint will now send the wrong payload to the old method and fail.
Critical checkpoints:
- Goal: batch AI requests for better performance. The PR largely achieves that for the happy path, but the two regressions above break existing valid inputs/configurations.
- Scope/focus: mostly focused, though the shared response parser changed behavior for non-batched paths too (
ai_agg). - Concurrency: no new shared-state concurrency issue found in the reviewed paths.
- Lifecycle/static init: no new lifecycle or SIOF issue found.
- Config:
embed_max_batch_sizeis added and forwarded to BE correctly, but I did not see FE-side range validation. - Compatibility: Gemini embed endpoint compatibility is regressed for users with explicit
:embedContentresources. - Parallel paths: the shared parser change also affects non-batched callers like
ai_agg. - Tests: the new BE tests cover happy-path batching well, but they miss bracketed scalar outputs and explicit-Gemini-endpoint compatibility.
| // provider response -> choices[0].message.content = "[\"1\",\"0\",\"1\"]" | ||
| // this helper -> appends "1", "0", "1" into `results` | ||
| static Status append_parsed_text_result(std::string_view text, | ||
| std::vector<std::string>& results) { |
There was a problem hiding this comment.
consider maybe we can also use std::vector<string_view> to reduce str copy
There was a problem hiding this comment.
有生命周期问题, 最少也得拷贝一次
|
/review |
There was a problem hiding this comment.
Blocking issues found in the current implementation.
Findings
be/src/exprs/function/ai/ai_functions.h:_init_from_resource()now usesDORIS_CHECKfor missing AI metadata / unknown resource names / unsupported providers, which turns ordinary query errors into BE process aborts instead of returningStatus.be/src/exprs/function/ai/ai_adapter.h:append_parsed_text_result()rejects valid pretty-printed JSON arrays, so batched scalar functions can fail on valid model output.be/src/exprs/function/ai/ai_adapter.h+be/src/exprs/function/ai/embed.h: Gemini multimodal batching loses the originalcontent_typeand rewrites many valid files to the wrong MIME type.fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java: the new session variables are forwarded without FE-side validation, while BE silently coerces invalid values to different defaults.be/src/exprs/function/ai/ai_functions.hvsbe/src/exprs/aggregate/aggregate_function_ai_agg.h: endpoint normalization was added only to scalar AI functions;ai_aggstill uses raw endpoints, so parallel AI paths now behave differently for the same resource.
Critical Checkpoints
- Goal / proof: The PR goal is to improve AI-function throughput by batching requests and adding session controls. The happy-path UT coverage is decent for batch sizing and endpoint normalization, but it does not cover the blocking regressions above.
- Scope / focus: The refactor is localized, but it changes request/response contracts, endpoint handling, and error semantics, so the effective behavioral scope is broader than the diff suggests.
- Concurrency: No new lock-order or concurrent mutation issue stood out in the touched paths.
- Lifecycle / static initialization: No new static-init issue found.
ai_aggstill depends on externally setting its staticQueryContext*, which remains fragile but unchanged. - Configuration: New query/session knobs were added, but invalid values are accepted on the FE side and silently rewritten on the BE side, so session state can diverge from real execution behavior.
- Compatibility / parallel paths: Scalar AI functions now normalize Gemini/base-version and legacy completions endpoints, but
ai_aggdoes not, so equivalent resources no longer behave consistently across parallel AI code paths. - Special conditions / parsing: Batch result parsing is too strict for valid JSON formatting, and
_init_from_resource()now crashes on recoverable user/config errors. - Test coverage: Missing coverage for whitespace-formatted batch JSON, invalid session-variable values, true MIME passthrough for multimodal Gemini inputs, and
ai_aggendpoint-parity cases. - Observability: Existing error surfaces are probably sufficient once the crash regression is fixed; no separate observability blocker found.
- Transaction / persistence / data-write correctness: Not applicable for this PR.
- FE-BE protocol: The new thrift field is added and forwarded, but FE/BE validation semantics are inconsistent for invalid values.
- Performance: The batching direction looks good, but the correctness and compatibility blockers above should be fixed before it is safe to merge.
| std::shared_ptr<AIAdapter>& adapter) const { | ||
| const ColumnWithTypeAndName& resource_column = block.get_by_position(arguments[0]); | ||
| StringRef resource_name_ref = resource_column.column->get_data_at(0); | ||
| std::string resource_name = std::string(resource_name_ref.data, resource_name_ref.size); |
There was a problem hiding this comment.
Using DORIS_CHECK here turns ordinary query/config errors into BE crashes. Before this patch _init_from_resource() returned Status for missing AI metadata, unknown resource names, or unsupported providers; after this change SELECT ai_sentiment('not_exist_resource', ...) aborts the process (the new UT even expects ASSERT_DEATH). These are not invariants, so this path should keep returning Status instead of asserting.
There was a problem hiding this comment.
FE will check it first, if BE cannot find ai_resource, it indicates there is a wrong/missing transmission
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
run buildall |
There was a problem hiding this comment.
I found blocking issues in the new batching/window paths.
Findings:
be/src/exprs/function/ai/ai_adapter.h:build_embedding_request()was only expanded in derived adapters. The base virtual still takes 2 arguments, so the newoverrides are invalid and existing 2-argument call sites stop compiling.be/src/exprs/aggregate/aggregate_function_ai_agg.h:ai_context_window_sizenow makesai_aggfail on small legal settings becausehandle_overflow()re-checks the same byte limit after replacing the context with the AI summary. The new unit test with value8expects an impossible success case.be/src/exprs/function/ai/ai_functions.h: the new byte-window estimator ignores JSON escaping, so escaping-heavy prompts can still serialize aboveai_context_window_size.be/src/exprs/function/ai/embed.h: EMBED batching has the same raw-size accounting problem and can exceed the configured window after JSON encoding.
Critical checkpoint conclusions:
- Goal / correctness: The PR tries to batch AI requests and expose
embed_max_batch_size/ai_context_window_size. The current code does not fully achieve that goal because the embedding adapter interface is internally inconsistent, theai_aggwindow handling regresses on low legal values, and the new byte window is not enforced against the actual serialized request size. The current tests do not prove these paths correct. - Scope / focus: The changes stay focused on batching and session-variable plumbing, but the adapter interface change is incomplete.
- Concurrency: No new thread-safety or lock-order issue was evident in the touched code.
- Lifecycle / static init: No special lifecycle or static-initialization issue was evident in the reviewed paths.
- Configuration: The new session variables are added and forwarded FE->BE, but the BE implementations do not yet honor the configured byte window correctly in all touched paths.
- Compatibility: I did not find a storage-format or FE/BE protocol compatibility problem in the changed code, but the new C++ virtual interface contract in
AIAdapteris currently broken. - Parallel code paths: Scalar AI batching,
EMBED, andAI_AGGwere all updated, but the byte-window logic is inconsistent or insufficient across them. - Conditional checks: The new overflow and window checks are not sufficient as implemented; see inline comments.
- Test coverage: Coverage improved for many happy paths and invalid session-variable values, but it still misses escaping-heavy batches, and one new
AI_AGGtest asserts behavior the implementation cannot provide. - Test result changes: The modified
ai_context_window_size_session_variable_testexpectation is not correct for the current logic. - Observability: No new observability blocker stood out.
- Transaction / persistence / data-write correctness: Not applicable for this PR.
- FE-BE variable passing: The new variables are passed through thrift correctly.
- Performance: The batching direction is good, but incorrect size accounting can still push failures to the provider instead of batching predictably.
- Other: No additional blocker beyond the items above.
Please address the inline comments before approval.
|
run buildall |
|
/review |
There was a problem hiding this comment.
The PR’s goal is clear: batch AI scalar functions / embed requests for better performance and expose session variables to control batch sizing. The implementation is directionally good, but I found several blocking correctness issues before this can be merged.
Findings:
- Scalar AI functions now globally reinterpret any response text that looks like a JSON array as batched output, which regresses legitimate literal outputs such as
[...]for single-rowai_generate/ai_summarize/ai_translate/ etc. - The new
ai_context_window_sizebatching logic for scalar AI functions underestimates serialized prompt size because it ignores JSON escaping, so the configured byte window is not enforced conservatively. ai_aggnow fails on the first oversized row by trying to execute the task with empty accumulated data.ai_aggstill reads query-specific state from a staticQueryContext*, so concurrent queries can bleedai_context_window_size, timeout, and AI-resource state into each other.- The added regression coverage validates
SETconstraints, but it never runs an AI query with non-defaultembed_max_batch_size/ai_context_window_size, so FE->thrift->BE propagation is still unproven.
Critical checkpoint conclusions:
- Goal of the current task: Partially accomplished. Batching/session controls were added, but the issues above mean the new behavior is not yet correct in all supported cases.
- Is the modification small, clear, and focused: Mostly focused on AI batching/session variables, but the generic adapter/parser refactor changed behavior for all scalar AI functions.
- Concurrency: Not safe for
ai_agg. Query-specific state still comes from a static pointer, so concurrent queries can interfere. - Lifecycle / static state: No new cross-TU static-init issue found, but
ai_agg’s staticQueryContext*lifecycle remains unsafe. - Configuration items:
embed_max_batch_sizeandai_context_window_sizewere added and forwarded, but current tests do not prove end-to-end propagation, and the scalar batching estimate does not conservatively honor the configured window. - Incompatible changes: The new thrift fields are additive, but scalar AI response parsing is now backward-incompatible for literal JSON-array strings.
- Parallel code paths:
embed, scalar AI functions, andai_aggnow each batch independently; oversized-input handling is inconsistent andai_aggis incorrect on the first oversized row. - Special conditional checks: The new
[...]auto-unpack condition is too broad because it runs in the generic adapter layer without batch-context gating. - Test coverage: Improved, but still missing coverage for literal JSON-array scalar outputs, escaped-size batching, first oversized
ai_aggrows, concurrentai_aggqueries, and non-default end-to-end session-variable behavior. - Has the test result been added/modified: No
.outupdates were needed here; the new regression additions currently only validate setters, not runtime effect. - Observability: A scalar batch-size mismatch warning was added, but there is still no specific observability for
ai_aggoverflow/cross-query issues. - Transaction / persistence / data writes: Not applicable for this PR.
- FE/BE variable passing: The new variables were added to thrift and consumed on BE paths, but end-to-end tests do not yet demonstrate that query execution actually sees the non-default values.
- Performance: Batching should help, but the escaped-size undercount can still produce oversized requests, and the
ai_aggstatic-context bug can corrupt behavior under load. - Other issues: None beyond the findings above.
Because these are correctness issues rather than tolerable nits, I’m requesting changes.
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud_p0 |
|
run external |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
Release note
Improving the performance of AI functions through batch sending, embed controls the number of (text/file) items sent in a single batch through the variable
embed_max_batch_size, and the remaining functions internally maintain a conservative context window.The current sending format is similar to:
performance:
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)